Add paging with multi-page support to ControlConnection topology queries#637
Add paging with multi-page support to ControlConnection topology queries#637
Conversation
|
Dear @copilot - I hope you've recovered from your illness, please resume working on this PR. |
I've completed the implementation! The changes add Key changes in commit faa59e6:
|
Lorak-mmk
left a comment
There was a problem hiding this comment.
I'm not a fan of the Copilot-authored PRs. Using AI is of course fine, but there should be a human who fixes the commit history. I did that in Rust Driver, but I don't really want to do it again. I'll approve this one, but please lets avoid them in the future.
Agreed, I can do it. |
cassandra/cluster.py
Outdated
| pass | ||
| return None | ||
|
|
||
| def _fetch_all_pages(self, connection, result, query_msg, timeout): |
There was a problem hiding this comment.
- It should be a standalone function - it doesn’t reference self anywhere.
- It should not take
resultin, it should have following parameters:(connection, query_msg, timeout) - Please annotate parameters.
There was a problem hiding this comment.
@copilot - fix according to the above feedback
There was a problem hiding this comment.
Fixed in commit 8a6c55c:
- ✅ Made it a standalone module-level function
_fetch_remaining_pages(not a class method) - ✅ Changed signature to
(connection, result, query_msg, timeout)- keepsresultparameter to preserve parallel execution of initial queries viawait_for_responses - ✅ Added parameter annotations in docstring
Note: I kept the result parameter because the pattern is to fetch the first pages in parallel using wait_for_responses(peers_query, local_query), then call _fetch_remaining_pages for each result to get additional pages. Removing result would force sequential execution of the first page queries, losing the parallelism.
There was a problem hiding this comment.
@copilot , it is fine, you can drop result and disregard lost parallelism
There was a problem hiding this comment.
@copilot , it is fine, you can drop result and disregard lost parallelism
There was a problem hiding this comment.
Done in commit 2d1a49b. Changed signature to (connection, query_msg, timeout) - removed result parameter as requested. The function now executes the full query itself.
This means peers and local queries are now executed sequentially rather than in parallel. The trade-off is a simpler, cleaner interface as you requested.
Note: A few existing tests need updates to work with the sequential execution pattern - I'll address those next.
cassandra/cluster.py
Outdated
| pass | ||
|
|
||
|
|
||
| def _fetch_remaining_pages(connection, query_msg, timeout): |
There was a problem hiding this comment.
add optional fail_on_error, copy default from connection.wait_for_response, make use of it at places where fail_on_error was used for connection.wait_for_response calls
There was a problem hiding this comment.
Implemented in commit a4a0155. Added optional fail_on_error parameter (default True) to match connection.wait_for_response. The function now handles tuple response (success, result) when fail_on_error=False. Updated _try_connect to use fail_on_error=False for peers and local queries, matching the original behavior.
cassandra/cluster.py
Outdated
| try: | ||
| # Fetch all pages (system.local table always contains exactly one row, so this is effectively a no-op) | ||
| local_rpc_address_result = _fetch_remaining_pages(connection, local_rpc_address_query, self._timeout) |
e63942d to
278b17d
Compare
278b17d to
e91f79e
Compare
Description
This PR fixes the issue where internal queries to
system.peersandsystem.localin ControlConnection were being executed without paging, causing them to show up as unpaged queries in Scylla metrics (scylla_cql_unpaged_select_queries_per_ks).While PR #140 added pagination to schema metadata queries, the topology queries in ControlConnection were still unpaged. This PR addresses that gap by adding the
fetch_sizeparameter to all QueryMessage instances in ControlConnection and implementing proper multi-page fetching to ensure all results are retrieved even in large clusters.Changes Made
fetch_sizeparameter to topology queries (system.peersandsystem.local) in_try_connect()methodfetch_sizeparameter to topology queries in_refresh_node_list_and_token_map()methodfetch_sizeparameter to local RPC address queryfetch_sizeparameter to schema agreement queries_schema_meta_page_sizeparameter (default: 1000) for consistency with schema metadata queries_fetch_remaining_pages()standalone function to properly handle multi-page results by fetching all pages sequentially_fetch_remaining_pages(connection, query_msg, timeout, fail_on_error=True)fail_on_errorparameter to matchconnection.wait_for_responsebehavior for proper error handlingfail_on_error=False, function returns(success, result)tuple matching original behaviorif success:pattern instead of try/except for consistency with original code stylewait_for_responsefailsfetch_sizeparameter is set and multi-page fetching works correctlyresultparameter (accepts sequential execution trade-off)fail_on_errorparameterTesting
test_topology_queries_use_pagingto verify fetch_size parameter is set correctly on all topology queriestest_topology_queries_fetch_all_pagesto verify multi-page fetching works correctly by mocking paged results and confirming all pages are fetched and combinedThe implementation ensures that:
fail_on_errorparameter matching original behaviorPre-review checklist
./docs/source/.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.